Skip to content

Use 'in <literal>' instead of str.isupper/isspace/islower in gen_moves (+3.7% nps)#132

Open
simin75simin wants to merge 1 commit into
thomasahle:masterfrom
simin75simin:perf-gen-moves-in-literal
Open

Use 'in <literal>' instead of str.isupper/isspace/islower in gen_moves (+3.7% nps)#132
simin75simin wants to merge 1 commit into
thomasahle:masterfrom
simin75simin:perf-gen-moves-in-literal

Conversation

@simin75simin
Copy link
Copy Markdown

Summary

Replace three Python str-method calls in Position.gen_moves with literal-string in checks. The search tree is unchanged; the only thing that changes is how each char is classified inside the move-generator loop.

original patched
q.isupper() method call q in "PNBRQK"
q.isspace() or q.isupper() two method calls q in " \nPNBRQK"
q.islower() method call q in "pnbrqk"

Why

gen_moves accounted for ~67% of CPython search time in a cProfile run of a 5-ply search from startpos. Inside it, the three str-methods above are called millions of times per search and each one is a Python-level attribute lookup plus a C call. Literal in is faster because CPython has a specialized opcode for it (CONTAINS_OP) that skips the method-lookup overhead.

timeit over a 120-char board scan, 100k iters, CPython 3.13:

c.isupper()                  3.33 us
c in "PNBRQK"                2.61 us   (~22% faster)
c.isspace() or c.isupper()   4.97 us
c in " \nPNBRQK"             3.34 us   (~33% faster)
c.islower()                  ~     (similar order)
c in "pnbrqk"                ~

Speedup

A 6-position suite (startpos + 5 typical openings/middlegames) at fixed depth 5, 5 runs each, CPython 3.13 on Windows:

              original          patched
nps           16,711 – 17,240   17,592 – 17,685
mean          17,030            17,656
speedup       —                 +3.7%

The variance bands don't overlap across 5 runs, so this is reproducible — not noise.

Correctness

  • Node-count identical: every position at every depth produces the same node count between original and patched (123,142 across the depth-5 suite, both versions).
  • Perft: matches across 6 standard positions at depth 3 — startpos = 8902, Kiwipete = 97862, position 3 = 2812, position 4 = 9467, promotion test = 62379, middlegame = 89890. Per-root-move breakdowns are also identical.
  • Mate-in-1 puzzles: first 8 entries from tools/test_files/mate1.fen produce identical bestmove between original and patched.

Notes on minimalism

build/clean.sh sunfish.py | wc -l is unchanged: 3 lines swapped 1:1 in gen_moves. The diff also adds a 3-line # NB: comment explaining why the literal form is used (so a future reader doesn't "clean up" back to method calls); comments are stripped by clean.sh so the 131-line claim is unaffected.

Test plan

  • Run perft_ab.py (modified vs original) at depth 3 across 6 positions — totals and per-root-move breakdowns identical.
  • Run mate-in-1 cross-check on first 8 mate1.fen positions — identical bestmove.
  • Run bench.py at depth 4 and depth 5, 5 runs each — speedup is reproducible.
  • (Upstream maintainer can verify) tools/quick_tests.sh end-to-end.

Profile shows `gen_moves` accounts for ~67% of CPython search time
(measured via `cProfile` on a 5-ply search from startpos). Inside that
function, `q.isupper()`, `q.isspace()`, and `q.islower()` are called
millions of times per search. In CPython each one is a Python-level
attribute lookup plus a C call.

Substring containment against a small literal is meaningfully faster:

    timeit on 120-char board scan, 100k iters:
      c.isupper()                  3.33 us
      c in "PNBRQK"                2.61 us   (~22% faster)
      c.isspace() or c.isupper()   4.97 us
      c in " \nPNBRQK"             3.34 us   (~33% faster)

End-to-end across a 6-position suite at depth 5 (5 runs each, same
machine):

      original   16,711 - 17,240 nps   (mean 17,030)
      patched    17,592 - 17,685 nps   (mean 17,656)
      speedup    +3.7% (ranges don't overlap)

The search tree is unchanged: node counts are byte-identical on every
position at every depth, perft matches across 6 positions at depth 3
(startpos=8902, kiwipete=97862, ...), and the first 8 mate-in-1 puzzles
from tools/test_files/mate1.fen produce identical `bestmove` output.

After cleanup the line count is also identical (3 functional lines
swapped 1:1).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant